Compilation Methods of Minimal Acyclic Finite-State Automata for Large Dictionaries

نویسندگان

  • Jorge Graña Gil
  • Francisco-Mario Barcala
  • Miguel A. Alonso
چکیده

We present a reflection on the evolution of the different methods for constructing minimal deterministic acyclic finite-state automata from a finite set of words. We outline the most important methods, including the traditional ones (which consist of the combination of two phases: insertion of words and minimization of the partial automaton) and the incremental algorithms (which add new words one by one and minimize the resulting automaton on-the-fly, being much faster and having significantly lower memory requirements). We analyze their main features in order to provide some improvements for incremental constructions, and a general architecture that is needed to implement large dictionaries in natural language processing (NLP) applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Applications of Finite Automata Representing Large Vocabularies Applications of Finite Automata Representing Large Vocabularies

The construction of minimal acyclic deterministic partial nite automata to represent large natural language vocabularies is described. Applications of such automata include: spelling checkers and advisers, multilanguage dictionaries, thesauri, minimal perfect hashing and text compression. Part of this research was supported by a grant awarded by the Brazilian National Council for Scienti c and ...

متن کامل

Incremental Construction of Minimal Acyclic Sequential Transducers from Unsorted Data

This paper presents an efficient algorithm for the incremental construction of a minimal acyclic sequential transducer (ST) for a dictionary consisting of a list of input and output strings. The algorithm generalises a known method of constructing minimal finite-state automata (Daciuk et al., 2000). Unlike the algorithm published by Mihov and Maurel (2001), it does not require the input strings...

متن کامل

Incremental Construction and Maintenance of Minimal Finite-State Automata

Daciuk et al. [Computational Linguistics 26(1):3–16 (2000)] describe a method for constructing incrementally minimal, deterministic, acyclic finite-state automata (dictionaries) from sets of strings. But acyclic finite-state automata have limitations: For instance, if one wants a linguistic application to accept all possible integer numbers or Internet addresses, the corresponding finitestate a...

متن کامل

Compiling Apertium morphological dictionaries with HFST and using them in HFST applications

In this paper we aim to improve interoperability and re-usability of the morphological dictionaries of Apertium machine translation system by formulating a generic finite-state compilation formula that is implemented in HFST finite-state system to compile Apertium dictionaries into general purpose finite-state automata. We demonstrate the use of the resulting automaton in FST-based spell-checki...

متن کامل

How to squeeze a lexicon

Minimal acyclic deterministic finite automata (ADFAs) can be used as a compact representation of finite string sets with fast access time. Creating them with traditional algorithms of DFA minimization is a resource hog when a large collection of strings is involved. This paper aims to popularize an efficient but little known algorithm for creating minimal ADFAs recognizing a finite language, in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001